EAKOS is a
collection of tools to demonstrate how one can interface with web based
visualization and GIS services. The toolset is an early prototype developed by
Lorne Leonard during his spare time at the 2008 Christmas break and weekends
leading up to the competition deadline.
Lorne works with researchers and faculty at The Pennsylvania State
University and he uses the toolset to demonstrate potential visualization and
analytical solutions to enhance their research goals.
Video:
Leonard_Vast2009_Challenge1.mov
ANSWERS:
MC1.1: Identify which
computer(s) the employee most likely used to send information to his contact in
a tab-delimited table which contains for each computer identified: when the
information was sent, how much information was sent and where that information
was sent. Please name the file Traffic.txt and place it in the same
directory as your index.htm file. Please
see the format required in the Task
Descriptions.
MC1.2: Characterize the patterns of behavior of suspicious computer use.
I approached
this challenge by identifying who went against policy and entered the building,
and more importantly the classified room, without badging in. Using the
"Prox Card" dataset, I plotted the sequence of daily activities per
employee ID based on the Prox card readers as demonstrated in Figure 1. To code this tool took
approximately three days. Highlighted in orange are the IDs that
"piggybacked" in/out of the classified room (for the entire day) and
the event is linked with a yellow line as shown in Figure 1.
|
Figure 1:
Prox Card data for ID 30 for the entire month duration. |
Automatically
highlighting the days when IDs broke protocol made it very easy and fast (about
5 minutes) to scroll through the 60 IDs and identify that IDs 30, 38 and 49
broke the protocol by piggybacking in/out of the classified room (Table 1). ID 30 was the worst offender
with three piggyback events into the classified room. What was this person doing
during the period of entering/leaving the classified room to the next event?
ID |
Day |
Event |
Event
Start Time |
Next
Event |
30 |
10 |
Did not prox in-classified |
10:33 AM |
5:05 PM |
30 |
17 |
Did not prox in-classified |
11:31 AM |
2:03 PM |
30 |
24 |
Did not prox in-classified |
9:00 AM |
10:52 AM |
38 |
4 |
Did not prox out-classified |
1:12 PM |
2:15 PM |
49 |
8 |
Did not prox out-classified |
12:56 PM |
2:06 PM |
Table 1:
IDs who piggybacked within the classified room. |
Each of the
three events for ID 30 occurred on a Thursday. Is this an arranged agreement
with his handlers?
|
Figure 2:
Prox Card data for IDs 38 and 49 for the entire month duration. |
To
investigate who entered the building only in the morning by piggybacking, I
manually inspected which days lacked a grey marker at the start of the
sequence. These results, shown in Table
2, took approximately 10 minutes to generate.
ID |
Day(s) |
0 |
17 |
7 |
2 |
13 |
8,23 |
27 |
24 |
37 |
24 |
38 |
3 |
39 |
24 |
48 |
23 |
49 |
8,22,31 |
50 |
30 |
51 |
2 |
54 |
16 |
55 |
16 |
58 |
31 |
59 |
31 |
Table 2:
IDs who piggybacked in the morning. |
Now that I
have identified the IDs who are piggybacking into the classified room, I developed
another tool to help identify ID connections, traffic amounts and event
times. This took about four days to
code, load the data and visualize. I visualized the "IP traffic"
dataset in two ways. The first, (Figure
3) by plotting source IPs against destination IP responses and request
sizes as a Treemap. This tool also includes the embassy plan showing assigned
rooms per ID and a sequence plot against the embassy plan of entering the
building and in/out of the classified room.
By referring to this sequence diagram, and the results from the first
tool above, I can identify when the piggybacking occurred. Furthermore, by
using the mouse, I can scroll over the larger Treemap cells to identify when
the IP traffic happened and manually determine if this event is around my time
of interest. The second visualization method (Figure 4) plots response versus request payloads per ID within a
specified date range. The purpose of this tool is to help identify if the
traffic event identified in the first visualization method is an outlier or
not. By manually doing mouse-over's with the outliers I can identify if these
events happened near the piggyback event or not. This process took
approximately 30 minutes to mouse over and collect the data, and the results are
shown in Table 3.
|
Figure 3:
Traffic sizes using a Treemap and ID sequence within embassy. |
|
Figure 4:
Plotting response versus request payloads. |
Source IP |
AccessTime |
DestIP |
Socket |
ReqSize |
RespSize |
Count |
37.170.100.30 |
2008-01-10T10:35:10.367 |
10.30.138.140 |
80 |
3817 |
569386 |
29 |
37.170.100.30 |
2008-01-10T14:29:11.316 |
101.160.27.28 |
80 |
4202 |
335984 |
12 |
37.170.100.30 |
2008-01-10T15:00:15.208 |
100.104.83.89 |
80 |
44979 |
1751371 |
2 |
37.170.100.30 |
2008-01-10T15:39:57.166 |
105.77.95.226 |
80 |
56278 |
606589 |
3 |
37.170.100.30 |
2008-01-10T16:09:56.584 |
105.211.108.147 |
80 |
4953 |
1597330 |
10 |
37.170.100.30 |
2008-01-10T16:12:03.871 |
103.143.114.91 |
80 |
53081 |
599157 |
4 |
37.170.100.30 |
2008-01-10T16:49:38.494 |
10.124.235.51 |
80 |
44364 |
893780 |
4 |
37.170.100.30 |
2008-01-10T16:56:40.458 |
103.120.93.59 |
80 |
16568 |
264422 |
4 |
37.170.100.30 |
2008-01-17T12:38:41.768 |
104.73.180.170 |
80 |
6000 |
404896 |
21 |
37.170.100.30 |
2008-01-17T12:38:51.905 |
37.105.202.184 |
80 |
65533 |
27996 |
5 |
37.170.100.30 |
2008-01-17T13:36:47.489 |
103.76.60.0 |
80 |
5076 |
10384 |
2 |
37.170.100.30 |
2008-01-17T13:36:53.933 |
10.30.138.140 |
80 |
5627 |
371604 |
29 |
37.170.100.30 |
2008-01-24T14:46:16.832 |
100.226.208.157 |
80 |
4879 |
156342 |
13 |
37.170.100.30 |
2008-01-24T14:46:25.842 |
10.228.35.56 |
80 |
55712 |
55412 |
1 |
37.170.100.30 |
2008-01-24T17:18:16.094 |
10.30.138.140 |
80 |
6789 |
5104487 |
29 |
37.170.100.38 |
2008-01-04T17:28:43.475 |
37.109.133.151 |
80 |
63665 |
1880595 |
2 |
|
|
|
|
|
|
|
37.170.100.49 |
2008-01-08T16:21:30.114 |
106.192.237.252 |
80 |
5502 |
1356516 |
3 |
|
|
|
|
|
|
|
Table 3: IDs
who piggybacked and possible payload outliers |
As previously
mentioned, ID 30 is the worst offender with piggyback events occurring on days
10, 17 and 24. The count column in Table
3 indicates the number of times the source IP contacted the destination IP
for one month. The most suspicious destination IP is 10.30.138.140 where ID 30
responded with large amounts of data on days 10, 17 and 24. Additionally, ID 30
contacts this person on a regular basis, with 29 times for the month and he/she
appears to be masking the activity by visiting another destination IP at nearly
the same time (highlighted in yellow).
There is no
activity near the piggyback event for IDs 38 and 49. Perhaps they are leaving
the building with the data. The
majority of larger response payloads happen either before the piggyback event
or sometime after another visit to the classified room. A possible scenario
is that either ID left the building to meet his/her contacts to discuss the
collected data and confirm the data meets his/her needs before transmitting
the information later that same day. If true, the event may have been at
1/4/2008 5:28 PM to 37.109.133.151 for ID 38 and 2008-01-08T16:21:30.114
to 106.192.237.252 for ID 49. However, if you compare the payload
events for the entire month for both IDs, the amount does not appear
significant (Figure 5).
|
||||